This paper is concerned with the problem of low rank plus sparse matrixdecomposition for big data. Conventional algorithms for matrix decompositionuse the entire data to extract the low-rank and sparse components, and arebased on optimization problems with complexity that scales with the dimensionof the data, which limits their scalability. Furthermore, existing randomizedapproaches mostly rely on uniform random sampling, which is quite inefficientfor many real world data matrices that exhibit additional structures (e.g.clustering). In this paper, a scalable subspace-pursuit approach thattransforms the decomposition problem to a subspace learning problem isproposed. The decomposition is carried out using a small data sketch formedfrom sampled columns/rows. Even when the data is sampled uniformly at random,it is shown that the sufficient number of sampled columns/rows is roughlyO(r\mu), where \mu is the coherency parameter and r the rank of the low rankcomponent. In addition, adaptive sampling algorithms are proposed to addressthe problem of column/row sampling from structured data. We provide an analysisof the proposed method with adaptive sampling and show that adaptive samplingmakes the required number of sampled columns/rows invariant to the distributionof the data. The proposed approach is amenable to online implementation and anonline scheme is proposed.
展开▼